Exploiting Delayed Synchronization Arrivals in Light-Weight Data Parallelism
نویسنده
چکیده
SPMD Single Program Multiple Data models and other traditional models of data parallelism provide parallelism at the processor level Barrier synchroniza tion is de ned at the level of processors where when a processor arrives at the barrier point early and waits for others to arrive no other useful work is done on that processor Program restructuring is one way of min imizing such latencies However such programs tend to be error prone and less portable In this paper we discuss how multithreading can be used in data paral lelism to mask delays due to application irregularity or processor load imbalance The discussion is in the con text of Coir our object oriented runtime system for parallelism The discussion concentrates on shared memory systems The sample application is an LU factorization algorithm for skyline sparse matrices We discuss p erformance results on the IBM PowerPC based symmetric multiprocessor system
منابع مشابه
Object Template Abstractions for Light-Weight Data-Parallelism
Data-parallelism is a widely used model for parallel programming. Control structures like parallel DO loops, and data structures like collections have been used to express data-parallelism. In typical implementations, these constructs are ' at' in the sense that only one data-parallel operation is active at any time. To model applications that can exploit overlap of synchronization and computat...
متن کاملExploiting Multi - Grained Parallelism for Multiple - Instruction - Stream Architectures
Exploiting parallelism is an essential part of maximizing the performance of an application on a parallel computer. Parallelism is traditionally exploited at two granularities: individual operations are executed in parallel within a processor to exploit instruction-level parallelism and loop iterations or processes are executed in parallel on different processors to exploit loop-level paralleli...
متن کاملMaximizing Loop
Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...
متن کاملThe Potential of Exploiting Coarse-Grain Task Parallelism from Sequential Programs
Research into automatic extraction of instruction-level parallelism and data parallelism from sequential languages by compilers has been going on for many years. However, task parallelism has been almost unexploited by parallelizing compilers. It has been shown that coarse-grain task parallelism is a useful additional resource of parallelism for multiprocessors, but the simple and restricted ex...
متن کاملOptimizing Data Parallel Operations on Many-Core Platforms
Data parallel operations are widely used in game, multimedia, physics and data-intensive and scientific applications. Unlike control parallelism, data parallelism comes from simultaneous operations across large sets of collection-oriented data such as vectors and matrices. A simple implementation can use OpenMP directives to execute operations on multiple data concurrently. However, this implem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997